{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "# 稠密连接网络（DenseNet）\n",
    "\n",
    "ResNet极大地改变了如何参数化深层网络中函数的观点。\n",
    "*稠密连接网络* (DenseNet） :cite:`Huang.Liu.Van-Der-Maaten.ea.2017` 在某种程度上是 ResNet 的逻辑扩展。让我们先从数学上了解一下。\n",
    "\n",
    "\n",
    "## 从ResNet到DenseNet\n",
    "\n",
    "回想一下任意函数的泰勒展开式（Taylor expansion），它把这个函数分解成越来越高阶的项。在 $x$ 接近 0 时，\n",
    "\n",
    "$$f(x) = f(0) + f'(0) x + \\frac{f''(0)}{2!}  x^2 + \\frac{f'''(0)}{3!}  x^3 + \\ldots.$$\n",
    "\n",
    "同样，ResNet 将函数展开为\n",
    "\n",
    "$$f(\\mathbf{x}) = \\mathbf{x} + g(\\mathbf{x}).$$\n",
    "\n",
    "也就是说，ResNet 将 $f$ 分解为两部分：一个简单的线性项和一个更复杂的非线性项。\n",
    "那么再向前拓展一步，如果我们想将 $f$ 拓展成超过两部分的信息呢？\n",
    "一种方案便是 DenseNet。\n",
    "\n",
    "![ResNet（左）与 DenseNet（右）在跨层连接上的主要区别：使用相加和使用连结。](../img/densenet-block.svg)\n",
    ":label:`fig_densenet_block`\n",
    "\n",
    "如 :numref:`fig_densenet_block` 所示，ResNet 和 DenseNet 的关键区别在于，DenseNet 输出是*连接*（用图中的 $[,]$ 表示）而不是如 ResNet 的简单相加。\n",
    "因此，在应用越来越复杂的函数序列后，我们执行从 $\\mathbf{x}$ 到其展开式的映射：\n",
    "\n",
    "$$\\mathbf{x} \\to \\left[\n",
    "\\mathbf{x},\n",
    "f_1(\\mathbf{x}),\n",
    "f_2([\\mathbf{x}, f_1(\\mathbf{x})]), f_3([\\mathbf{x}, f_1(\\mathbf{x}), f_2([\\mathbf{x}, f_1(\\mathbf{x})])]), \\ldots\\right].$$\n",
    "\n",
    "最后，将这些展开式结合到多层感知机中，再次减少特征的数量。\n",
    "实现起来非常简单：我们不需要添加术语，而是将它们连接起来。\n",
    "DenseNet 这个名字由变量之间的“稠密连接”而得来，最后一层与之前的所有层紧密相连。\n",
    "稠密连接如 :numref:`fig_densenet` 所示。\n",
    "\n",
    "![稠密连接。](../img/densenet.svg)\n",
    ":label:`fig_densenet`\n",
    "\n",
    "稠密网络主要由 2 部分构成： *稠密块*（dense block）和 *过渡层* （transition layer）。\n",
    "前者定义如何连接输入和输出，而后者则控制通道数量，使其不会太复杂。\n",
    "\n",
    "\n",
    "## (**稠密块体**)\n",
    "\n",
    "DenseNet 使用了 ResNet 改良版的“批量归一化、激活和卷积”结构（参见 :numref:`sec_resnet` 中的练习）。\n",
    "我们首先实现一下这个结构。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import paddle\n",
    "import paddle.nn as nn\n",
    "\n",
    "def conv_block(input_channels, num_channels):\n",
    "    return nn.Sequential(\n",
    "        nn.BatchNorm2D(input_channels), nn.ReLU(),\n",
    "        nn.Conv2D(input_channels, num_channels, kernel_size=3, padding=1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "一个*稠密块*由多个卷积块组成，每个卷积块使用相同数量的输出信道。\n",
    "然而，在前向传播中，我们将每个卷积块的输入和输出在通道维上连结。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "class DenseBlock(nn.Layer):\n",
    "    def __init__(self, num_convs, input_channels, num_channels):\n",
    "        super(DenseBlock, self).__init__()\n",
    "        layer = []\n",
    "        for i in range(num_convs):\n",
    "            layer.append(\n",
    "                conv_block(num_channels * i + input_channels, num_channels))\n",
    "        self.net = nn.Sequential(*layer)\n",
    "\n",
    "    def forward(self, X):\n",
    "        for blk in self.net:\n",
    "            Y = blk(X)\n",
    "            # 连接通道维度上每个块的输入和输出\n",
    "            X = paddle.concat(x=[X, Y], axis=1)\n",
    "        return X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "在下面的例子中，我们[**定义一个**]有 2 个输出通道数为 10 的 (**`DenseBlock`**)。\n",
    "使用通道数为 3 的输入时，我们会得到通道数为 $3+2\\times 10=23$ 的输出。\n",
    "卷积块的通道数控制了输出通道数相对于输入通道数的增长，因此也被称为*增长率*（growth rate）。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[4, 23, 8, 8]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blk = DenseBlock(2, 3, 10)\n",
    "X = paddle.randn([4, 3, 8, 8])\n",
    "Y = blk(X)\n",
    "Y.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## [**过渡层**]\n",
    "\n",
    "由于每个稠密块都会带来通道数的增加，使用过多则会过于复杂化模型。\n",
    "而过渡层可以用来控制模型复杂度。\n",
    "它通过 $1\\times 1$ 卷积层来减小通道数，并使用步幅为 2 的平均池化层减半高和宽，从而进一步降低模型复杂度。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def transition_block(input_channels, num_channels):\n",
    "    return nn.Sequential(\n",
    "        nn.BatchNorm2D(input_channels), nn.ReLU(),\n",
    "        nn.Conv2D(input_channels, num_channels, kernel_size=1),\n",
    "        nn.AvgPool2D(kernel_size=2, stride=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "对上一个例子中稠密块的输出[**使用**]通道数为 10 的[**过渡层**]。\n",
    "此时输出的通道数减为 10，高和宽均减半。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[4, 10, 4, 4]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blk = transition_block(23, 10)\n",
    "blk(Y).shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## [**DenseNet模型**]\n",
    "\n",
    "我们来构造 DenseNet 模型。DenseNet 首先使用同 ResNet 一样的单卷积层和最大池化层。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "b1 = nn.Sequential(nn.Conv2D(1, 64, kernel_size=7, stride=2, padding=3),\n",
    "                   nn.BatchNorm2D(64), nn.ReLU(),\n",
    "                   nn.MaxPool2D(kernel_size=3, stride=2, padding=1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "接下来，类似于 ResNet 使用的 4 个残差块，DenseNet 使用的是 4 个稠密块。\n",
    "与 ResNet 类似，我们可以设置每个稠密块使用多少个卷积层。\n",
    "这里我们设成 4，从而与 :numref:`sec_resnet` 的 ResNet-18 保持一致。\n",
    "稠密块里的卷积层通道数（即增长率）设为 32，所以每个稠密块将增加 128 个通道。\n",
    "\n",
    "在每个模块之间，ResNet 通过步幅为 2 的残差块减小高和宽，DenseNet 则使用过渡层来减半高和宽，并减半通道数。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# `num_channels`为当前的通道数\n",
    "num_channels, growth_rate = 64, 32\n",
    "num_convs_in_dense_blocks = [4, 4, 4, 4]\n",
    "blks = []\n",
    "for i, num_convs in enumerate(num_convs_in_dense_blocks):\n",
    "    blks.append(DenseBlock(num_convs, num_channels, growth_rate))\n",
    "    # 上一个稠密块的输出通道数\n",
    "    num_channels += num_convs * growth_rate\n",
    "    # 在稠密块之间添加一个转换层，使通道数量减半\n",
    "    if i != len(num_convs_in_dense_blocks) - 1:\n",
    "        blks.append(transition_block(num_channels, num_channels // 2))\n",
    "        num_channels = num_channels // 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "与 ResNet 类似，最后接上全局池化层和全连接层来输出结果。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "DenseNet = nn.Sequential(b1, *blks, nn.BatchNorm2D(num_channels), nn.ReLU(),\n",
    "                    nn.AdaptiveMaxPool2D((1, 1)), nn.Flatten(),\n",
    "                    nn.Linear(num_channels, 10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## [**训练模型**]\n",
    "\n",
    "由于这里使用了比较深的网络，本节里我们将输入高和宽从 224 降到 96 来简化计算。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Cache file /home/aistudio/.cache/paddle/dataset/fashion-mnist/train-images-idx3-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/fashion_mnist/train-images-idx3-ubyte.gz \n",
      "Begin to download\n",
      "\n",
      "Download finished\n",
      "Cache file /home/aistudio/.cache/paddle/dataset/fashion-mnist/train-labels-idx1-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/fashion_mnist/train-labels-idx1-ubyte.gz \n",
      "Begin to download\n",
      "........\n",
      "Download finished\n",
      "Cache file /home/aistudio/.cache/paddle/dataset/fashion-mnist/t10k-images-idx3-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/fashion_mnist/t10k-images-idx3-ubyte.gz \n",
      "Begin to download\n",
      "\n",
      "Download finished\n",
      "Cache file /home/aistudio/.cache/paddle/dataset/fashion-mnist/t10k-labels-idx1-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/fashion_mnist/t10k-labels-idx1-ubyte.gz \n",
      "Begin to download\n",
      "..\n",
      "Download finished\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The loss value printed in the log is the current step, and the metric is the average value of previous steps.\n",
      "Epoch 1/10\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working\n",
      "  return (isinstance(seq, collections.Sequence) and\n",
      "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:641: UserWarning: When training, we now always track global mean and variance.\n",
      "  \"When training, we now always track global mean and variance.\")\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "step   1/235 - loss: 3.4109 - acc_top1: 0.0859 - acc_top5: 0.5273 - 10s/step\n",
      "step   2/235 - loss: 11.8268 - acc_top1: 0.1055 - acc_top5: 0.5508 - 9s/step\n",
      "step   3/235 - loss: 10.2840 - acc_top1: 0.1211 - acc_top5: 0.5286 - 9s/step\n",
      "step   4/235 - loss: 6.6358 - acc_top1: 0.1260 - acc_top5: 0.5312 - 9s/step\n",
      "step   5/235 - loss: 7.6157 - acc_top1: 0.1250 - acc_top5: 0.5258 - 9s/step\n",
      "step   6/235 - loss: 3.3201 - acc_top1: 0.1309 - acc_top5: 0.5312 - 9s/step\n",
      "step   7/235 - loss: 2.5737 - acc_top1: 0.1323 - acc_top5: 0.5273 - 9s/step\n",
      "step   8/235 - loss: 2.5586 - acc_top1: 0.1348 - acc_top5: 0.5322 - 9s/step\n",
      "step   9/235 - loss: 2.5062 - acc_top1: 0.1354 - acc_top5: 0.5347 - 9s/step\n",
      "step  10/235 - loss: 2.2328 - acc_top1: 0.1375 - acc_top5: 0.5289 - 9s/step\n",
      "step  11/235 - loss: 2.2632 - acc_top1: 0.1378 - acc_top5: 0.5291 - 9s/step\n",
      "step  12/235 - loss: 2.2602 - acc_top1: 0.1396 - acc_top5: 0.5309 - 9s/step\n"
     ]
    }
   ],
   "source": [
    "import paddle.vision.transforms as T\n",
    "from paddle.vision.datasets import FashionMNIST\n",
    "\n",
    "lr, num_epochs, batch_size = 0.1, 10, 256\n",
    "\n",
    "# 数据集处理\n",
    "transform = T.Compose([\n",
    "    T.Resize(96),\n",
    "    T.Transpose(),\n",
    "    T.Normalize([127.5], [127.5]),\n",
    "])\n",
    "# 数据集定义\n",
    "train_dataset = FashionMNIST(mode='train', transform=transform)\n",
    "val_dataset = FashionMNIST(mode='test', transform=transform)\n",
    "\n",
    "model = paddle.Model(DenseNet)\n",
    "model.prepare(\n",
    "    paddle.optimizer.Adam(learning_rate=lr, parameters=model.parameters()),\n",
    "    paddle.nn.CrossEntropyLoss(),\n",
    "    paddle.metric.Accuracy(topk=(1, 5)))\n",
    "# 模型训练\n",
    "model.fit(train_dataset, val_dataset, epochs=num_epochs, batch_size=batch_size, log_freq=200)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## 小结\n",
    "\n",
    "* 在跨层连接上，不同于 ResNet 中将输入与输出相加，稠密连接网络（DenseNet）在通道维上连结输入与输出。\n",
    "* DenseNet 的主要构建模块是稠密块和过渡层。\n",
    "* 在构建 DenseNet 时，我们需要通过添加过渡层来控制网络的维数，从而再次减少信道的数量。\n",
    "\n",
    "\n",
    "## 练习\n",
    "\n",
    "1. 为什么我们在过渡层使用平均池化层而不是最大池化层？\n",
    "1. DenseNet 的优点之一是其模型参数比 ResNet 小。为什么呢？\n",
    "1. DenseNet 一个诟病的问题是内存或显存消耗过多。\n",
    "    1. 真的是这样吗？可以把输入形状换成 $224 \\times 224$ ，来看看实际的显存消耗。\n",
    "    1. 你能想出另一种方法来减少显存消耗吗？你需要如何改变框架？\n",
    "1. 实现 DenseNet 论文 :cite:`Huang.Liu.Van-Der-Maaten.ea.2017` 表 1 所示的不同 DenseNet 版本。\n",
    "1. 应用DenseNet的思想设计一个基于多层感知机的模型。将其应用于 :numref:`sec_kaggle_house` 中的房价预测任务。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "[Discussions](https://discuss.d2l.ai/t/1880)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "PaddlePaddle 2.1.0 (Python 3.5)",
   "language": "python",
   "name": "py35-paddle1.2.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
